UniMelb_NLP-CORE: Integrating predictions from multiple domains and feature sets for estimating semantic textual similarity
نویسندگان
چکیده
In this paper we present our systems for calculating the degree of semantic similarity between two texts that we submitted to the Semantic Textual Similarity task at *SEM-2013 shared task. Our systems predict similarity using a regression over features based on the following sources of information: string similarity, topic distributions of the texts based on latent Dirichlet allocation, and similarity between the documents returned by an information retrieval engine when the target texts are used as queries. We also explore methods for integrating predictions using different training datasets and feature sets. Our best system was ranked 17th out of 89 participating systems. In our post-task analysis, we identify simple changes to our system that further improve our results.
منابع مشابه
CLaC-CORE: Exhaustive Feature Combination for Measuring Textual Similarity
CLaC-CORE, an exhaustive feature combination system ranked 4th among 34 teams in the Semantic Textual Similarity shared task STS 2013. Using a core set of 11 lexical features of the most basic kind, it uses a support vector regressor which uses a combination of these lexical features to train a model for predicting similarity between sentences in a two phase method, which in turn uses all combi...
متن کاملECNUCS: Measuring Short Text Semantic Equivalence Using Multiple Similarity Measurements
This paper reports our submissions to the Semantic Textual Similarity (STS) task in ∗SEM Shared Task 2013. We submitted three Support Vector Regression (SVR) systems in core task, using 6 types of similarity measures, i.e., string similarity, number similarity, knowledge-based similarity, corpus-based similarity, syntactic dependency similarity and machine translation similarity. Our third syst...
متن کاملMayoClinicNLP-CORE: Semantic representations for textual similarity
The Semantic Textual Similarity (STS) task examines semantic similarity at a sentencelevel. We explored three representations of semantics (implicit or explicit): named entities, semantic vectors, and structured vectorial semantics. From a DKPro baseline, we also performed feature selection and used sourcespecific linear regression models to combine our features. Our systems placed 5th, 6th, an...
متن کاملUQeResearch: Semantic Textual Similarity Quantification
This paper presents an approach for estimating the Semantic Textual Similarity of full English sentences as specified in Shared Task 2 of SemEval-2015. The semantic similarity of sentence pairs is quantified from three perspectives structural, syntactical, and semantic. The numerical representations of the derived similarity measures are then applied to train a regression ensemble. Although non...
متن کاملIBM_EG-CORE: Comparing multiple Lexical and NE matching features in measuring Semantic Textual similarity
We present in this paper the systems we participated with in the Semantic Textual Similarity task at SEM 2013. The Semantic Textual Similarity Core task (STS) computes the degree of semantic equivalence between two sentences where the participant systems will be compared to the manual scores, which range from 5 (semantic equivalence) to 0 (no relation). We combined multiple text similarity meas...
متن کامل